3D Recursive Gaussian IIR on GPUs and FPGAs A Case Study for Accelerating Bandwidth-Bounded Applications

نویسندگان

Jason Cong

Muhuan Huang

Yi Zou

چکیده

GPU devices typically have a higher off-chip bandwidth than FPGA-based systems. Thus typically GPU should perform better for bandwidth-bounded massive parallel applications. In this paper we present our implementations of a 3D recursive Gaussian IIR on multicore CPU, many-core GPU and multi-FPGA platforms. Our baseline implementation on the CPU features the smallest arithmetic computation (2 MADDs per dimension). Since this application is clearly bandwidth bounded, we show that the difference on the memory subsystems on different platform requires different bandwidth optimization techniques. Our implementations on the GPU and FPGA platforms show a 26X and 33X speedup respectively over the optimized single-thread code on the CPU.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Adaptive Self-adjusting Bandwidth Bandpass Filter without IIR Bias

In this paper we introduce a simple, computationally inxepentsive, adaptive recursive structure for enhancing bandpass signals highly corrupted by broad-band noise. This adaptive algorithm, enhancing input signals, enables us to estimate the center frequency and the bandwidth of the input signal. In addition, an important feature of the proposed structure is that the conventional bias existing ...

متن کامل

An Adaptive Self-adjusting Bandwidth Bandpass Filter without IIR Bias

متن کامل

Accelerating high-order WENO schemes using two heterogeneous GPUs

A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...

متن کامل

A Configurable VHDL Template for Parallelization of 3D Stencil Codes on FPGAs

2D and 3D stencil code applications are very common in scientific computing, but their performance is mostly limited by the memory bandwidth. Elaborate onchip buffering techniques minimize memory transfers, but they cannot be directly realized on fixed general-purpose processors or GPUs. FPGAs instead offer flexibility regarding the processing scheme, the degree of parallelism and the numerical...

متن کامل

Are FPGAs Suitable for Edge Computing?

The rapid growth of Internet-of-things (IoT) and artificial intelligence applications have called forth a new computing paradigm–edge computing. In this paper, we study the suitability of deploying FPGAs for edge computing from the perspectives of throughput sensitivity to workload size, architectural adaptiveness to algorithm characteristics, and energy efficiency. This goal is accomplished by...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

3D Recursive Gaussian IIR on GPUs and FPGAs A Case Study for Accelerating Bandwidth-Bounded Applications

نویسندگان

چکیده

منابع مشابه

An Adaptive Self-adjusting Bandwidth Bandpass Filter without IIR Bias

An Adaptive Self-adjusting Bandwidth Bandpass Filter without IIR Bias

Accelerating high-order WENO schemes using two heterogeneous GPUs

A Configurable VHDL Template for Parallelization of 3D Stencil Codes on FPGAs

Are FPGAs Suitable for Edge Computing?

عنوان ژورنال:

اشتراک گذاری